Multi-faceted search

ABSTRACT

A facility for representing a set of items each potentially having a value for each of a group of attributes is described. The items are represented in a database made up of two or more discrete components. Each component corresponds to a proper subset of group of attributes, and represents for every item of the set the values of its proper subset of attributes. Every component is organized such that data items are represented within it in the same order.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application is a divisional of U.S. patent application Ser. No.13/938,531, filed on Jul. 10, 2013, now U.S. Pat. No. 9,424,305, whichis a divisional of U.S. patent application Ser. No. 11/943,695, filed onNov. 21, 2007, now U.S. Pat. No. 8,510,349, which claims the benefit ofU.S. Provisional Patent Application No. 60/873,618, filed on Dec. 6,2006, each of which is hereby incorporated by reference in its entirety.

TECHNICAL FIELD

The described technology is directed to the field of query resolutiontechniques.

BACKGROUND

A user submits a search query in order to identify, among a set of dataitems, data items having certain characteristics. For example, it iscommon for users to query a relational database by submitting a querythat specifies values of one or more fields present in the database, andreceive in return a query result listing records in the database thatcontain the specified values in the specified fields. Queries may eitherbe applied directly against the authoritative data source containinginformation about the set of data items, or against a separate indexthat is optimized for handling certain kinds of queries.

In the case of some sets of data items, the data items have attributesof different types that all may be the subject of a query. For example,in addition to relational fields, some conventional database enginessupport the storage of geographic locations for data items. In such acase, two separate indices are constructed: a relational index whosestructure is tailored to identifying data items based upon theirrelational field contents, and a geographic index whose structure istailored to identifying data items based upon their geographiclocations-such as an R-tree. A query specifying relational attributesalone is typically processed solely against the relational index, whilea query specifying geographic attributes alone is typically processedsolely against the geographic index.

In conventional database systems, a query that specifies attributes ofmultiple types, sometimes called a “hybrid query,” is first processedagainst the index appropriate to each attribute type. In the aboveexample, a hybrid query specifying both relational and geographicattributes would be processed independently against both the relationaland geographic indices. Each of the indices produces an intermediatequery result, sometimes called a “constituent query result,” identifyingall of the data items having the specified attributes of the attributetype represented in the index, irrespective of whether they have theattributes of attribute types not represented in the index. In order toobtain a final query result from the constituent query results, theconstituent query results must be joined, or “intersected,” so that thefinal query result contains only data items present in each of theconstituent query results. Joining groups of data items such as thosecontained in the constituent query results is much more efficient if thedata items in each group occur in the same order as in the other groups.Because the different indices used to represent the different types ofattributes usually have different structures to more effectivelyidentify data items based upon their different attribute types, however,the constituent query results they produce tend to list items indifferent orders. Accordingly, in the conventional approach, theconstituent query results must all be sorted into a common order beforejoining.

This process is illustrated in FIG. 1. FIG. 1 is a data flow diagramshowing a conventional process for processing a hybrid query. First,indices 111-113, each representing different attribute types, areinitially built and then maintained to reflect changes in the datasource. Second, a query 120 received from the user is appliedsimultaneously against all of the indices to obtain a constituent queryresult for each of the indices, here constituent query results 131-133.Third, each constituent query result is normalized, such as by sortingit to obtain a normalized query result, here normalized query results141-143. Finally, the normalized constituent query results areintersected, such as by joining them, to obtain a final query result150.

Unfortunately, sorting the constituent query results before joining themis often an expensive operation, consuming significant computingresources. Accordingly, an approach to processing a hybrid query withoutsorting constituent query results would have significant utility.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a data flow diagram showing a conventional process forprocessing a hybrid query.

FIG. 2 is a high-level data flow diagram showing data flow within atypical arrangement of components used to provide the facility.

FIG. 3 is a block diagram showing some of the components typicallyincorporated in at least some of the computer systems and other deviceson which the facility executes.

FIG. 4 is a data flow diagram showing a typical process for processing ahybrid query performed by the facility.

FIGS. 5A-5C are table diagrams showing a text index, a geographic index,and a relational index.

FIG. 6 is a flow diagram showing steps typically performed by thefacility in order to process a hybrid query.

FIG. 7A-7C show text, geographic, and relational constituent results.

FIG. 8 a table diagram showing a final result generated by the facilityfor the sample query shown in Table 2 by joining the constituent resultsshown in FIGS. 7A-7C.

FIGS. 9-13 show sample displays presented as part of the user interface.

FIG. 14-16 are flow diagrams showing steps typically performed by thefacility in order to present the query specification user interfacedescribed above.

FIG. 17 is a flow diagram showing steps typically performed by thefacility in order to select the items to be displayed in a particularpage of a search result.

DETAILED DESCRIPTION

A software facility for handling search queries (“the facility”) isdescribed. In some embodiments, the facility uses an index, sometimesreferred to as a “compound index,” that is specially adapted toresolving hybrid queries that specify two or more different kinds ofsearch criteria to identify data items in a database satisfying thecriteria included in the query.

For example, in some embodiments, the facility uses a compound index toresolve queries including criteria of any of the following three types:(1) textual criteria that specify textual attributes of a data item; (2)relational criteria that specify relational attributes of a data item;and (3) geographic criteria specifying geographic location attributes ofa data item. In some embodiments, the compound index is made up of (1)an inverted index representing the textual attributes of each data item;(2) an indexed relational database made up of one or more tablesrepresenting the relational attributes of each data item; and (3) ageopoint table representing the geographic attributes of each data item,where item IDs that identify items in the inverted index correspond tothe order of rows representing items in the tables of the relationaldatabase as well as the order of rows representing items in the geopointtable. When textual criteria from the query are applied against theinverted index, relational criteria from the query are applied to therelational database, and geographic criteria from the query are appliedto the geopoint table, the resulting three constituent result sets canbe joined together to form a final result set without having to firstsort any of the constituent result sets, as each constituent result setis ordered in accordance with the item IDs.

In some embodiments, the facility provides a user interface that permitsa user to create a query by specifying values or ranges of values(hereafter “values”) for each of a number of item attributes. Some ofthese values are always displayed within the user interface, whileothers are displayed only when a drop-down menu containing them isselected by the user. Whenever a new value is specified for anattribute, the facility (1) processes a query selecting items having thespecified values and/or ranges of values to obtain a result set; (2)identifies any attribute whose values are displayed but for which novalue has yet been specified; (3) in a single pass through the resultset, counts the number of items having each of the displayed attributevalues; and (4) displays the count for each of the displayed attributevalues next to the attribute value. In the case of some attributes whosevalues are widely-variable, the facility (5) establishes a large numberof “bins” (such as 50 bins) each corresponding to a small range ofvalues of the attribute; (6) as part of (3), for each bin, counts thenumber of items having a value of the attribute within the range for thebin; and (7) collapses the large number of bins to a smaller number ofbins, such a four bins, each containing a roughly similar number ofitems and generally corresponding to larger ranges than the originalbins; and (8) in (4), displays the attribute value ranges and counts forthe collapsed bins. This approach provides a powerful queryspecification user interface while consuming reasonable quantities ofcomputing resources.

In some embodiments, the facility uses a paging technique to display theresults generated for a search query. Where a user has requested thedisplay of n items per page in a paged search result, each time the userrequests the mth page of a search result, the facility (1) reruns thequery on which the search result is based; (2) performs a repeatable, or“stable,” sort to populate, but not internally sort, each page up to andincluding the mth page; and (3) internally sorts and displays the dataitems populated into the mth page. This overcomes the problem of havingto expend extraordinary computing resources to avoid unstable pagedresult sets, in which items of the search result having the same valueof the attribute on which the search result sort is based that span apage boundary may be seen to appear on both of the two pages separatedby the page boundary.

FIG. 2 is a high-level data flow diagram showing data flow within atypical arrangement of components used to provide the facility. A numberof web client computer systems 210 that are under user control generateand send page view requests 231 to a logical web server 200 via anetwork such as the Internet 220. These requests typically include pageview requests and other requests of various types relating toformulating queries, executing queries, and/or displaying and/or pagingquery results. Within the web server, these requests may either all berouted to a single web server computer system, or may be loaded-balancedamong a number of web server computer systems. The web server typicallyreplies to each with a served page 232.

While various embodiments are described in terms of the environmentdescribed above, those skilled in the art will appreciate that thefacility may be implemented in a variety of other environments includinga single, monolithic computer system, as well as various othercombinations of computer systems or similar devices connected in variousways. In various embodiments, a variety of computing systems or otherdifferent client devices may be used in place of the web client computersystems, such as mobile phones, personal digital assistants,televisions, cameras, etc.

FIG. 3 is a block diagram showing some of the components typicallyincorporated in at least some of the computer systems and other deviceson which the facility executes. These computer systems and devices 300may include one or more central processing units (“CPUs”) 301 forexecuting computer programs; a computer memory 302 for storing programsand data while they are being used; a persistent storage device 303,such as a hard drive for persistently storing programs and data; acomputer-readable media drive 304, such as a CD-ROM drive, for readingprograms and data stored on a computer-readable medium; and a networkconnection 305 for connecting the computer system to other computersystems, such as via the Internet. While computer systems configured asdescribed above are typically used to support the operation of thefacility, those skilled in the art will appreciate that the facility maybe implemented using devices of various types and configurations, andhaving various components.

FIG. 4 is a data flow diagram showing a typical process for processing ahybrid query performed by the facility. First, as is shown and describedfurther below, the facility builds and maintains each of theindices-here indices 411-413 in a normalized form, such as byrepresenting the data items in the same order in each of the indices.Second, the facility applies a query 420 received from the user againstall the indices to obtain a constituent query result for each of theindices, here constituent query results 431-433. Because of the mannerin which the indices were built and maintained, these constituent queryresults are already in normalized form, and it is not necessary to incurthe extra cost in computing resources of sorting them. Third, thefacility intersects the constituent query results to obtain a finalquery result 450.

Table 1 below shows a sample data item among a set of data itemssearched by the facility. In this case, the sample data item containsvarious kinds of information about a home.

TABLE 1 home id: 20 address: 1539 NW 58th St, Seattle, WA 98107location: 47.670820, −122.376557 make me move: No for sale: No recentlysold: No price: $448,310 bedrooms: 3 Bathrooms 1 size: 1,370 sq. ft.lot: 6,453 sq. ft. type: single family sale date: Oct. 10, 1999description: Cozy bungalow on quiet street. You'll love how theafternoon sun filters into the back yard.

The home has an identifier of 20, and a street address as shown. Thehome further has a location identified by the shown latitude andlongitude values. The home's make me move, for sale, and recently soldstatuses are all no. The home's price is shown, as are its number ofbedrooms and bathrooms, its floor area and lot size, its type and itssale date. Further, a narrative description is shown for the home.

FIGS. 5A-5C show different indices maintained on a group of home dataitems including the one described in Table 1. FIG. 5A is a table diagramshowing a text index 510 used to identify home data items among the setof home data items having particular words in their textualdescriptions. The text index is made up of rows including shown rows521-530 each representing the occurrence of a single word in the textualdescription of a single home, and each divided into the followingcolumns: a term column 511 containing the word, and a home id column 512containing the home id of a home data item containing the word in itstextual description. For example, row 525 indicates that the word “cozy”is contained in the textual description of the home data item havinghome id 20. It can been seen that the rows of the index for each word(e.g., rows 521-522 for the word “cottage,” rows 523-528 for the word“cozy,” and rows 529-530 for the word “cubbies”) occur in increasingorder of home id.

While FIG. 5A and each of the table diagrams discussed below show atable whose contents and organization are designed to make them morecomprehensible by a human reader, those skilled in the art willappreciate that actual data structures used by the facility to storethis information may differ from the table shown, in that they, forexample, may be organized in a different manner; may contain more orless information than shown; may be compressed and/or encrypted; etc.

FIG. 5B shows a geographic index 540 maintained by the facility on theset of home data items. The geographic index is made up of rowsincluding shown rows 551-555, each corresponding to a different homedata item, and each divided into the following columns: a home id columncontaining the home id for the home data item; a latitude column 542containing the latitude value for the home; and a longitude column 543containing a longitude value for the home. For example, it can be seenthat row 553 indicates that the home data item having home id 20 has alatitude value of 47.670820 and a longitude value of −122.376557. It canbe seen that the rows are ordered in increasing order of the home idcolumn. In some embodiments, the facility maintains this order by addingeach new data item to the end of the geographic index with a home idthat is larger than the largest existing home id.

FIG. 5C shows a relational index 560 used to identify home data itemshaving particular relational values. The relational index is made up ofrows such as shown rows 581-585, each of which corresponds to adifferent home data item and is divided into the following columns: ahome id column 561 containing a home id for the home; a for sale column562 indicating whether the home presently has a for sale status; a makeme move column 563 indicating whether the home presently has a make memove status; a recently sold column 564 that indicates whether the homepresently has a recently sold status; a price column 565 indicating aprice for the home; a beds column 566 indicating the number of bedroomsin the home; a baths column 567 showing the number of bathrooms in thehome; a size column 568 showing a measurement of the floor area of thehome; a lot column 569 showing a measurement of the area of the home'slot; a type column 570 indicating the type of the home; and a sale datecolumn 571 indicating the last date on which the home was sold. Forexample, it can be seen from row 583 that the home having home id 20does not presently have the for sale, make me move, or recently soldproperties; has a price of $348,310; has 3 bedrooms and 1 bathroom; hasa floor area of 1,370 square feet and a lot size of 6,453 square feet;is a single family home; and was last sold on Oct. 10, 1999. It can beseen that the rows are ordered in increasing order of the home id. Insome embodiments, the facility achieves this result by synchronizing therows of the relational index with the rows of the geographic index shownin FIG. 5B.

FIG. 6 is a flow diagram showing steps typically performed by thefacility in order to process a hybrid query. In steps 601, the facilityreceives the query, which specifies two or more types of criteria. Anexample query discussed further below is shown in Table 2.

TABLE 2 Text criterion: “cozy” Geographic criteria: latitude between47.670750 and 47.671150; longitude between −122.376575 and −122.376490Relational criterion: Price between $300K and $400K

A user may specify the example query, for example, by typing the word“cozy” in a text field; selecting the price range $300K-$400K from alist of price ranges; and navigating a displayed map to show the regionencompassing the specified latitude and longitude ranges.

In steps 602-604, the facility loops through each type of criterionspecified in the query—in the sample query, the text, geographic, andrelational criteria. In step 603, the facility selects from the indexfor the current criteria type in accordance with the criteria of thattype specified in the query to generate a constituent result. In step604, if additional criterion types remain to be processed, then thefacility continue in steps 602 to process the next criterion type, alsothe facility continues in steps 605.

Sample constituent results generated based upon the sample query shownin Table 2 and the indices shown in FIGS. 5A-5C are shown in FIGS.7A-7C. FIG. 7A shows a text constituent result 710. In some embodiments,the facility generates this constituent result by reading the text indexuntil it first encounters the term “cozy,” and copying this row of theindex through the last row containing the term “cozy.” In someembodiments, the facility instead jumps to the first row containing theterm “cozy,” using an additional index on the index, not shown. It canbe seen that the text constituent result contains rows 721-726,corresponding to all of the rows 523-528 contain the word “cozy” in thetext index shown in FIG. 5A. It can be seen that the rows of the textconstituent result are ordered in increasing order of home id as aresult of having been selected from the text index ordered in the sameway.

FIG. 7B shows the geographic constituent result 730. In someembodiments, the facility generates this constituent result by readingeach row of the geographic index to determine whether its latitude andlongitude both fall within the range specified by the query. It can beseen that rows 741-746 all contain home locations within the latitudeand longitude ranges specified by the query. It can further be seen thatthe rows of the geographic constituent result are ordered in increasingorder of home id, as a result of the geographic index shown in FIG. 5Bbeing ordered in the same manner.

FIG. 7C shows the relational constituent result 730. In someembodiments, the facility generates this constituent result by readingeach row of the relational index to determine whether its attributevalues all satisfy the relational constraints of the query. It can beseen that rows 781-785 each correspond to a home having a price in thespecified range. Here too, the rows are ordered in increasing order ofhome id, as a result of the relational index shown in FIG. 5C having thesame order.

In step 605, the facility joins the constituent results generated instep 603. In step 606, the facility returns the results of the joinoperation performed in step 605 as the final result for the queryreceived in step 601. After step 606, the facility continues in step 601to receive and process the next query.

Those skilled in the art will appreciate that the steps shown in FIG. 6and in each of the flow diagrams discussed below may be altered in avariety of ways. For example, the order of the steps may be rearranged;substeps may be performed in parallel; shown steps may be omitted, orother steps may be included; etc.

FIG. 8 a table diagram showing a final result generated by the facilityfor the sample query shown in Table 2 by joining the constituent resultsshown in FIGS. 7A-7C. It contains rows 801-803, containing the followinghome ids that are common to each of the three constituent results: 19,20 and 49.

In some embodiments, the facility provides a user interface that permitsa user to create a query by specifying values or ranges of values foreach of a number of item attributes. FIGS. 9-13 show sample displayspresented as part of the user interface.

FIG. 9 is a display diagram showing an initial display of the userinterface presented by the facility. The display 900 corresponds to asearch query and its result. A map 920 has been generated for inclusionin the display based upon the user having entered the address for a homeshown near the center of the map in fields 901 and 902, then activatingbutton 903. In response, the facility displayed the map 920 that isshown, centered on the home in question at an intermediate zoom level.The borders of the map have established a geographic criterion for thesearch. Accordingly, the current search result is a list 950 (onlypartially shown here) of all the homes that are located within thepresent borders of the map. The user can navigate to different pages ofthe query result using controls 904 and 905. The user may alter thegeographic query criterion by navigating the map, such as by scrollingthe map in a direction using control 921, or by changing its zoom levelusing zoom control 922. When the user changes the boundaries of the mapin this way, the facility updates the geographic query criterion toinclude the present boundaries of the map, and executes the updatedquery to generate a new search result that includes the homes within thenew boundaries. The user may also specify relational criteria for thesearch query using panel 910, which is discussed in greater detail belowin connection with FIGS. 10-13. Though not shown in FIG. 9, in someembodiments, the facility includes in the query specification userinterface it presents a mechanism usable by the user to specify atextual criteria for the query, such as a text field.

FIG. 10 is a display diagram that shows panel 910, which is usable bythe user to specify relational criteria for the query, in greater detailas panel 1010. Panel 1010 includes indications 1011-1014 of a number ofdifferent home statuses. For example, the for sale status 1011 is activefor any home that is known to be presently for sale. The checkbox at theleft end of the indication indicates that homes having this status areincluded in the search result. The parenthetical number at the right endof indication 1011 indicates that eight of the homes presently in thesearch result have this status. Panel 1010 further has a number ofsubpanels 1020, 1030, 1040, 1050, 1060, 1070, and 1080 eachcorresponding to a different relational attribute, which are each shownhere in collapsed form. By selecting the control at the left end of oneof these subpanels, the user can expand it in order to specifyadditional relational criteria. For example, the user may select control1021 in order to specify a relational criterion for the query that isbased upon the price attribute.

FIG. 11 is a display diagram showing an expanded version of the priceattribute subpanel in the facility when the user selects control 1021.The expanded price attribute subpanel 1180 lists a number of subrangesof the price attribute that may be selected by the user in order tospecify a query criterion for the price attribute. A first indication1181 may be selected by the user to collapse subpanel 1180 withoutspecifying a subrange of the price attribute for inclusion in the query.On the other hand, the user may select any of indications 1182-1184 inorder to specify a query criterion for the displayed range. For example,the user may select indication 1184 in order to add to the query acriterion requiring a price attribute to be between $300 k and $400 k.The parenthetical at the right end of this indication indicates that,among the homes contained in the current query result, six of them fallinto this range and would satisfy such a criterion. The user may alsoenter a custom range into fields 1185 and 1186, and select control 1187in order to create a query criterion for the custom range.

FIG. 12 is a display diagram that shows panel 1210 after the user hasselected indication 1184 in FIG. 11. By comparing FIG. 12 to FIG. 10, itcan be observed that the facility has updated the counts displayed foreach of statuses 1211-1213 shown in the panel to reflect the number ofhomes having these statuses in the query result for the updated querycontaining the price criterion specified by the user in selectingindication 1184. Additionally, it can be seen that collapsed pricesubpanel 1220 now contains an indication that this attribute has beenconstrained to the specified range. The user may go on to specifyadditional criteria, or select the clear all filters control 1291 todelete the existing relational criterion from the query.

FIG. 13 shows a sample display presented by the facility when the usergoes on to select control 1251 to expand the collapsed size subpanel1250. The expanded subpanel 1350 contains ranges for the size attributeand an indication for each range of the number of homes in the currentquery result that fall into each subrange. For example, indication 1353shows that two homes in the current search result have a size attributevalue between two thousand and three thousand square feet. Again, theuser may make a selection in expanded size panel 1350 to add to thequery another relational criterion specifying a particular subrange forthe size attribute.

FIG. 14 is a flow diagram that shows steps typically performed by thefacility in order to present the query specification user interfacedescribed above. In step 1401, the facility receives the specificationof an attribute value or range from the user, such as is described abovein connection with FIGS. 9, 11, and 13. In step 1402, the facilityperforms a query that includes any attribute values and ranges specifiedby the user. In step 1403, the facility identifies attributes that areor can be displayed for which the user has specified no value or range.In step 1404, the facility establishes counters for values or ranges ofidentified attributes.

FIG. 15 is a flow diagram showing additional details of step 1404. Insteps 1501-1506, the facility loops through each attribute identified instep 1403. In step 1502, if the values for this attribute are enumerated(e.g., yes/no, or condominium/single family), then the facilitycontinues in step 1503, else (e.g., for price, size, etc., attributes),the facility continues in step 1504. In step 1503, the facilityestablishes a counter for each enumerated value of the attribute. Afterstep 1503, the facility continues in step 1506.

In step 1504, the facility divides the range of possible values for theattribute into a large number of subranges, also called “bins.” As oneexample, the facility may establish 101 subranges for the priceattribute: 100 $20,000-wide subranges between zero and $2,000,000, and asubrange over $2,000,000. In step 1505, the facility establishes acounter for each bin established in step 1504. In step 1506, ifadditional identified attributes remain to be processed, then thefacility continues in step 1501 to process the next identifiedattribute, else the facility returns.

Returning to FIG. 14, in step 1405, in a single pass through the queryresult generated in step 1402, the facility updates all of the countersestablished in step 1404. In step 1406, the facility consolidatesattribute value bins and their counters in order to be able to display areasonable number of subranges.

FIG. 16 is a flow diagram showing details of step 1406. In steps1601-1604, the facility loops through each attribute identified in step1403. In step 1602, if the values for this attribute are enumerated,then the facility continues in 1604, else the facility continues in step1603. In step 1603, the facility consolidates the bins established forthe attribute for a smaller number of bins, each containing a similarnumber of items. For example, from the 101 bins established for theprice attribute, the facility may form four consolidated bins, eachcontaining approximately one quartile of the homes counted among all 101original bins. In step 1604, if additional identified attributes remainto be processed then the facility continues in step 1601 to process thenext identified attribute, else the facility returns.

Returning to FIG. 14, in step 1407, the facility displays the values andranges for the identified attributes and their counters as shown inFIGS. 12 and 13.

In some embodiments, as shown in FIG. 9, the facility uses a pagingtechnique to display the results generated for a search query. Usingcontrols presented as part of the user interface for presenting thesearch result, the user can request the display of any mth page of nitems of the search result. In various embodiments, the number of itemsn on a page of the search result is preestablished by the designer ofthe facility, and/or configurable by the user.

FIG. 17 is a flow diagram showing steps typically performed by thefacility in order to select the items to be displayed in a particularpage of a search result. In step 1701, the facility receives a copy ofthe query, a page size—that is, the number of items to be shown on eachpage of the search result, a page number of the search result to bereturned, and one or more attributes of the data items on which to sortthe data items, and a sort direction that indicates whether the searchresult will be sorted in increasing or decreasing order of the sortattribute. For example, the request received in step 1701 may specify toreturn page number two where each page contains five items, and sort inincreasing order of a particular attribute. In step 1702, the facilityprocesses the query to obtain the corresponding search results, such asby using the process described above in connection with FIGS. 4-8. Asample search result generated in step 1702 is shown below in Table 3.

TABLE 3 initial order position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1718 19 20 id 12 97 44 68 6 72 41 43 42 22 98 6 33 81 25 24 69 88 11 31sort 2 3 10 18 14 9 3 3 20 17 15 3 19 12 8 16 10 1 13 3 value

Table 3 shows a search result containing 20 items. The table shows, foreach of the 20 positions in the search result, the item ID of the itemin that position, as well as the value of the attribute on which theitems are being sorted for that item.

In steps 1703-1705, the facility loops through each page of the searchresult from the first page to the page to return. In steps 1704, thefacility populates the current page by sorting the portion of the searchresult from the beginning of the current page to the end of the searchresult up to the point at which the current page contains the largest orsmallest values in this range as specified by the sort direction. Thefacility typically uses a repeatable, or “stable,” sort algorithm toavoid the problem of instability. In some embodiments, the facility usesa truncated quicksort algorithm to perform this sorting. In someembodiments, the facility employs a fat pivot as part of the sortingprocess. In some embodiments, the facility uses a deterministic basisfor selecting a pivot as part of the sorting process. In step 1705, ifadditional pages remain to be processed, then the facility continues instep 1703 to process the next page, else the facility continues in step1706.

Table 4 below shows the results of the first iteration of the loop ofstep 1703-1705.

TABLE 4 populate first page position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1516 17 18 19 20 id 12 6 43 88 97 22 31 81 42 44 68 6 69 33 72 41 11 24 9825 sort 2 3 3 1 3 17 3 12 20 10 18 14 10 19 9 3 13 16 15 8 value

It can be seen by comparing Table 4 to Table 3 that data items 6, 12,43, 88, and 97—having the lowest sort values 1-3 among the items in thesearch result—have been sorted into the first page of five items in thesearch result. It is further noted that (a) the items within the firstpage are not ordered in accordance with sort value, and (b) the itemsbeyond the first page are also not ordered in accordance with sortvalue.

Table 5 below shows the results of the second iteration of the loop ofstep 1703-1705.

TABLE 5 populate second page position 1 2 3 4 5 6 7 8 9 10 11 12 13 1415 16 17 18 19 20 id 12 6 43 88 97 69 25 41 72 31 6 42 81 24 11 68 22 9844 33 sort 2 3 3 1 3 10 8 3 9 3 14 20 12 16 13 18 17 15 10 19 value

It can be seen Table 5 to Table 4 that the second iteration of the loophas not affected the data items on the first page of the search results.It can further be seen that data items 25, 31, 41, 69, and 72—having thelowest sort values 3-10 among the items in position 6-20 of the searchresult—have been sorted into the second page of five items in the searchresult. It is further noted that (a) the items within the second pageare not ordered in accordance with sort value, and (b) the items beyondthe second page are also not ordered in accordance with sort value. Inthe example, after the second iteration of the loop, the loop concludes

In step 1706, the facility sorts the items within the page to return. Insome embodiments, the facility uses a sort algorithm such as insertionsort in step 1706. Table 6 below shows the result of sorting the itemswithin the second page of the sample search result.

TABLE 6 sort second page position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1617 18 19 20 id 12 6 43 88 97 41 31 25 72 69 6 42 81 24 11 68 22 98 44 33sort 2 3 3 1 3 3 3 8 9 10 14 20 12 16 13 18 17 15 10 19 value

It can be seen by comparing Table 6 to Table 5 that the facility hassorted the data items in the second page in increasing order of theirshort values.

In step 1707, the facility returns the sorted page. In the example, thefacility returns positions 6-10 in the search results shown in Table 6.After step 1707, these steps conclude.

By reviewing Tables 4-6, it can be seen how the facility overcomes theinstability problem. If the user later again requests the second page ofthe same search result, the facility again performs two iterations ofthe loop. Because of the repeatability of the sort algorithm used instep 1704, the first iteration of the loop has the same result asbefore, again committing item 6, 43, and 97 having sort value 3 to thefirst page rather than item 31 or item 41 which both also have sortvalue 3. Similarly, the second iteration loop has the same result asbefore: items 31 and 41, the remaining items with sort value 3, arecommitted to the second page, as is item 69 having sort value 10 to theexclusion of item 44 having sort value 10. If the user later requeststhe first page of the same search result, the facility executes the looponce, producing the same result shown above in Table 4, such that theitems having sort value 3 included in the first page would be 6, 43, and97, not 31 or 41 which appeared earlier in the second page. If the userlater requests the third page of the same search result, the facilityexecutes the loop three times. In the second iteration of these three,the facility commits item 69 having sort value 10 to the second page,making it unavailable for inclusion in the population of the third pagein the third iteration of the loop, and ensuring that item 44 havingsort value 10 is populated into the third page.

It will be appreciated by those skilled in the art that theabove-described facility may be straightforwardly adapted or extended invarious ways. For example, the facility may be used to search data itemsof a variety of types, using queries containing criteria of a variety oftypes. In processing a query, the facility may use a variety of dataresources, including both data sources of various types and indices thatare of various types. User interfaces presented by the facility forquery specification and search result display may have a wide variety oforganizations and appearances. Search results page population andsorting performed by the facility may utilize a wide variety of sortingalgorithms and associated techniques. While the foregoing descriptionmakes reference to particular embodiments, the scope of the invention isdefined solely by the claims that follow and the elements recitedtherein.

I claim:
 1. A method in a computing system for identifying from aninitial sequence in arbitrary order of items each having a sort valuethose items that should be included on the mth page of n items of afully-sorted sequence of the items, comprising: (a) initializing aworking sequence having the same order as the initial sequence; (b)after (a), for each of pages 1 to m: determining whether any item in thefirst n positions of the working sequence has a sort value smaller thanthe sort value of any position of the working sequence greater than n,if it is determined that an item in the first n positions of the workingsequence has a sort value smaller than the sort value of any position ofthe working sequence greater than n, until none of the items in thefirst n positions of the working sequence has a sort value smaller thanthe sort value of any position of the working sequence greater than n,subjecting the working sequence to a repeatable exchange sort; once noneof the items in the first n positions of the working sequence has a sortvalue smaller than the sort value of any position of the workingsequence greater than n, if the current page is less than m, removingthe first n positions from the working sequence; and (c) after (b),identifying the items in the first n positions of the working sequence.2. The method of claim 1, further comprising: sorting the identifieditems until they are ordered in accordance with their sort value; anddisplaying the sorted identified items.
 3. The method of claim 2 whereinthe identified items are sorted using an insertion sort.
 4. The methodof claim 1 wherein the repeatable exchange sort to which the workingsequence is subjected is a quicksort.
 5. The method of claim 1 whereinthe repeatable exchange sort to which the working sequence is subjectedis a truncated quicksort.
 6. The method of claim 1 wherein therepeatable exchange sort to which the working sequence is subjected is aquicksort using a fat pivot.
 7. The method of claim 1 wherein therepeatable exchange sort to which the working sequence is subjected is aquicksort that deterministically selects pivot locations.
 8. The methodof claim 1 wherein the repeatable exchange sort to which the workingsequence is subjected is a quicksort that selects a central position inthe working sequence as the pivot location.
 9. The method of claim 1,further comprising: receiving a query; receiving a page size; receivinga page number to return; receiving a sort attribute; and receiving asort direction.
 10. A computing system, having a memory and a processor,configured to identify from an initial sequence in arbitrary order ofitems each having a sort value those items that should be included onthe mth page of n items of a fully-sorted sequence of the items, thecomputing system comprising: an initialization component configured toinitialize a working sequence having the same order as the initialsequence and to initialize a sorting window to empty; and a sortingcomponent configured to, after the working sequence and the sortingwindow are initialized, for each page of pages 1 to m: subject theworking sequence, exclusive of the sorting window, to a repeatableexchange sort only until none of the items in the first n positions ofthe working sequence, exclusive of the sorting window, has a sort valuesmaller than the sort value of any position of the working sequence,exclusive of the sorting window, greater than n; and expand the sortingwindow to include the positions in the current page; wherein theinitialization component and the sorting component each comprisecomputer-executable instructions stored in the memory for execution bythe processor.
 11. A computer-readable storage device storing contentthat, when executed by a computing system having a processor, causes thecomputing system to perform a method for identifying, from an initialsequence in arbitrary order of items each having a sort value, thoseitems that should be included on the mth page of n items of afully-sorted sequence of the items, the method comprising: (a)initializing a working sequence having the same order as the initialsequence; (b) after (a), for each of pages 1 to m: subjecting theworking sequence to a repeatable exchange sort only until none of theitems in the first n positions of the working sequence has a sort valuelarger than the sort value of any position of the working sequencegreater than n; once none of the items in the first n positions of theworking sequence has a sort value larger than the sort value of anyposition of the working sequence greater than n, if the current page isless than m, removing the first n positions from the working sequence;and (c) after (b), identifying the items in the first n positions of theworking sequence.
 12. The computer-readable storage device of claim 11,the method further comprising: sorting the identified items until theyare ordered in accordance with their sort value; and displaying thesorted identified items.
 13. The computer-readable storage device ofclaim 11 wherein the identified items are sorted using an insertionsort.
 14. The computer-readable storage device of claim 11 wherein therepeatable exchange sort to which the working sequence is subjected is aquicksort.
 15. The computer-readable storage device of claim 11 whereinthe repeatable exchange sort to which the working sequence is subjectedis a truncated quicksort.
 16. The computer-readable storage device ofclaim 11 wherein the repeatable exchange sort to which the workingsequence is subjected is a quicksort using a fat pivot.
 17. Thecomputer-readable storage device of claim 11 wherein the repeatableexchange sort to which the working sequence is subjected is a quicksortthat deterministically selects pivot locations.
 18. Thecomputer-readable storage device of claim 11 wherein the repeatableexchange sort to which the working sequence is subjected is a quicksortthat selects a central position in the working sequence as the pivotlocation.
 19. The computer-readable storage device of claim 11, themethod further comprising: identifying those items that should beincluded on the mth page of n items; and sorting the identified itemsthat should be included on the mth page of n items.
 20. Thecomputer-readable storage device of claim 19, wherein sorting theidentified items that should be included on the mth page of n itemscomprises using an insertion sort to sort the identified items thatshould be included on the mth page of n items.